220 research outputs found

    Studying bacterial transcriptomes using RNA-seq

    Get PDF
    Genome-wide studies of bacterial gene expression are shifting from microarray technology to second generation sequencing platforms. RNA-seq has a number of advantages over hybridization-based techniques, such as annotation-independent detection of transcription, improved sensitivity and increased dynamic range. Early studies have uncovered a wealth of novel coding sequences and non-coding RNA, and are revealing a transcriptional landscape that increasingly mirrors that of eukaryotes. Already basic RNA-seq protocols have been improved and adapted to looking at particular aspects of RNA biology, often with an emphasis on non-coding RNAs, and further refinements to current techniques will improve our understanding of gene expression, and genome content, in the future

    Identification, variation and transcription of pneumococcal repeat sequences.

    Get PDF
    BACKGROUND: Small interspersed repeats are commonly found in many bacterial chromosomes. Two families of repeats (BOX and RUP) have previously been identified in the genome of Streptococcus pneumoniae, a nasopharyngeal commensal and respiratory pathogen of humans. However, little is known about the role they play in pneumococcal genetics. RESULTS: Analysis of the genome of S. pneumoniae ATCC 700669 revealed the presence of a third repeat family, which we have named SPRITE. All three repeats are present at a reduced density in the genome of the closely related species S. mitis. However, they are almost entirely absent from all other streptococci, although a set of elements related to the pneumococcal BOX repeat was identified in the zoonotic pathogen S. suis. In conjunction with information regarding their distribution within the pneumococcal chromosome, this suggests that it is unlikely that these repeats are specialised sequences performing a particular role for the host, but rather that they constitute parasitic elements. However, comparing insertion sites between pneumococcal sequences indicates that they appear to transpose at a much lower rate than IS elements. Some large BOX elements in S. pneumoniae were found to encode open reading frames on both strands of the genome, whilst another was found to form a composite RNA structure with two T box riboswitches. In multiple cases, such BOX elements were demonstrated as being expressed using directional RNA-seq and RT-PCR. CONCLUSIONS: BOX, RUP and SPRITE repeats appear to have proliferated extensively throughout the pneumococcal chromosome during the species' past, but novel insertions are currently occurring at a relatively slow rate. Through their extensive secondary structures, they seem likely to affect the expression of genes with which they are co-transcribed. Software for annotation of these repeats is freely available from ftp://ftp.sanger.ac.uk/pub/pathogens/strep_repeats/

    Bayesian inference of ancestral dates on bacterial phylogenetic trees

    Get PDF
    The sequencing and comparative analysis of a collection of bacterial genomes from a single species or lineage of interest can lead to key insights into its evolution, ecology or epidemiology. The tool of choice for such a study is often to build a phylogenetic tree, and more specifically when possible a dated phylogeny, in which the dates of all common ancestors are estimated. Here, we propose a new Bayesian methodology to construct dated phylogenies which is specifically designed for bacterial genomics. Unlike previous Bayesian methods aimed at building dated phylogenies, we consider that the phylogenetic relationships between the genomes have been previously evaluated using a standard phylogenetic method, which makes our methodology much faster and scalable. This two-step approach also allows us to directly exploit existing phylogenetic methods that detect bacterial recombination, and therefore to account for the effect of recombination in the construction of a dated phylogeny. We analysed many simulated datasets in order to benchmark the performance of our approach in a wide range of situations. Furthermore, we present applications to three different real datasets from recent bacterial genomic studies. Our methodology is implemented in a R package called BactDating which is freely available for download at https://github.com/xavierdidelot/BactDating

    Heterogeneity in the Frequency and Characteristics of Homologous Recombination in Pneumococcal Evolution

    Get PDF
    The bacterium Streptococcus pneumoniae (pneumococcus) is one of the most important human bacterial pathogens, and a leading cause of morbidity and mortality worldwide. The pneumococcus is also known for undergoing extensive homologous recombination via transformation with exogenous DNA. It has been shown that recombination has a major impact on the evolution of the pathogen, including acquisition of antibiotic resistance and serotype-switching. Nevertheless, the mechanism and the rates of recombination in an epidemiological context remain poorly understood. Here, we proposed several mathematical models to describe the rate and size of recombination in the evolutionary history of two very distinct pneumococcal lineages, PMEN1 and CC180. We found that, in both lineages, the process of homologous recombination was best described by a heterogeneous model of recombination with single, short, frequent replacements, which we call micro-recombinations, and rarer, multi-fragment, saltational replacements, which we call macro-recombinations. Macro-recombination was associated with major phenotypic changes, including serotype-switching events, and thus was a major driver of the diversification of the pathogen. We critically evaluate biological and epidemiological processes that could give rise to the micro-recombination and macro-recombination processes

    Genome-wide association, prediction and heritability in bacteria with application to Streptococcus pneumoniae

    Get PDF
    Whole-genome sequencing has facilitated genome-wide analyses of association, prediction and heritability in many organisms. However, such analyses in bacteria are still in their infancy, being limited by difficulties including genome plasticity and strong population structure. Here we propose a suite of methods including linear mixed models, elastic net and LD-score regression, adapted to bacterial traits using innovations such as frequency-based allele coding, both insertion/deletion and nucleotide testing and heritability partitioning. We compare and validate our methods against the current state-of-art using simulations, and analyse three phenotypes of the major human pathogen Streptococcus pneumoniae, including the first analyses of minimum inhibitory concentrations (MIC) for penicillin and ceftriaxone. We show that the MIC traits are highly heritable with high prediction accuracy, explained by many genetic associations under good population structure control. In ceftriaxone MIC, this is surprising because none of the isolates are resistant as per the inhibition zone criteria. We estimate that half of the heritability of penicillin MIC is explained by a known drug-resistance region, which also contributes a quarter of the ceftriaxone MIC heritability. For the within-host carriage duration phenotype, no associations were observed, but the moderate heritability and prediction accuracy indicate a moderately polygenic trait.Peer reviewe

    PANINI : Pangenome Neighbour Identification for Bacterial Populations

    Get PDF
    The standard workhorse for genomic analysis of the evolution of bacterial populations is phylogenetic modelling of mutations in the core genome. However, a notable amount of information about evolutionary and transmission processes in diverse populations can be lost unless the accessory genome is also taken into consideration. Here, we introduce PANINI (Pangenome Neighbour Identification for Bacterial Populations), a computationally scalable method for identifying the neighbours for each isolate in a data set using unsupervised machine learning with stochastic neighbour embedding based on the t-SNE (t-distributed stochastic neighbour embedding) algorithm. PANINI is browser-based and integrates with the Microreact platform for rapid online visualization and exploration of both core and accessory genome evolutionary signals, together with relevant epidemiological, geographical, temporal and other metadata. Several case studies with single- and multi-clone pneumococcal populations are presented to demonstrate the ability to identify biologically important signals from gene content data. PANINI is available at http://panini.pathogen.watch and code at http://gitlab.com/cgps/panini.Peer reviewe

    RCandy: an R package for visualizing homologous recombinations in bacterial genomes

    Get PDF
    SUMMARY: Homologous recombination is an important evolutionary process in bacteria and other prokaryotes, which increases genomic sequence diversity and can facilitate adaptation. Several methods and tools have been developed to detect genomic regions recently affected by recombination. Exploration and visualization of such recombination events can reveal valuable biological insights, but it remains challenging. Here, we present RCandy, a platform-independent R package for rapid, simple and flexible visualization of recombination events in bacterial genomes. AVAILABILITY AND IMPLEMENTATION: RCandy is an R package freely available for use under the MIT license. It is platform-independent and has been tested on Windows, Linux and MacOSX. The source code comes together with a detailed vignette available on GitHub at https://github.com/ChrispinChaguza/RCandy. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

    Genome-wide identification of lineage and locus specific variation associated with pneumococcal carriage duration.

    Get PDF
    Streptococcus pneumoniae is a leading cause of invasive disease in infants, especially in low-income settings. Asymptomatic carriage in the nasopharynx is a prerequisite for disease, but variability in its duration is currently only understood at the serotype level. Here we developed a model to calculate the duration of carriage episodes from longitudinal swab data, and combined these results with whole genome sequence data. We estimated that pneumococcal genomic variation accounted for 63% of the phenotype variation, whereas the host traits considered here (age and previous carriage) accounted for less than 5%. We further partitioned this heritability into both lineage and locus effects, and quantified the amount attributable to the largest sources of variation in carriage duration: serotype (17%), drug-resistance (9%) and other significant locus effects (7%). A pan-genome-wide association study identified prophage sequences as being associated with decreased carriage duration independent of serotype, potentially by disruption of the competence mechanism. These findings support theoretical models of pneumococcal competition and antibiotic resistance
    corecore